Query-friendly Compression and Indexing of Recurring Structures in XML Documents

نویسندگان

  • Satish Vemula
  • Justin Hare
  • Seo-Young Noh
چکیده

XML documents are by design self-describing. In order to accomplish this, the XML data is highly verbose and very repetitious. Although techniques already exist to compress XML and text in general, most do not keep the data in a form that is useful to users. We present a technique that makes use of recurring structures within an XML document to compress the file in a way that can achieve better compression than other query-friendly compression techniques while still maintaining the data in a form that allows for both querying and indexing. Further, we present an example implementation of the technique, complete with an index-building mechanism and query processing capabilities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

A Generic Framework for Querying and Updating Secondary XML Index Structures

To cope with the increasing number and size of XML documents, XML databases provide index structures to accelerate queries on the content and structure of documents. To adapt indices to the query workload, XML databases require various secondary index structures. This paper presents a generic index framework called sciens (Structure and Content Indexing with Extensible, Nestable Structures). In...

متن کامل

XIQS: An XML Indexing and Query System

Retrieval from XML data sets is an actively researched field that presents some different problems from retrieval of relational databases. The challenges stem from the characteristics of the tree structures of XML data. In this paper we present a system, XIQS, for XML query processing with an indexing strategy. Internal data structures are built based on the data type definitions (DTD) of the X...

متن کامل

XSeq: An Index Infrastructure for Tree Pattern Queries

Given a tree-pattern query, most XML indexing approaches decompose it into multiple sub-queries, and then join their results to provide the answer to the original query. Join operations have been identified as the most time-consuming component in XML query processing. XSeq is a powerful XML indexing infrastructure which makes tree patterns a first class citizen in XML query processing. Unlike m...

متن کامل

Indexing and Searching XML Documents Based on Content and Structure Synopses

We present a novel framework for indexing and searching schema-less XML documents based on concise summaries of their structural and textual content. Our search query language is XPath extended with full-text search. We introduce two novel data synopsis structures that correlate textual with positional information in an XML document and improves query precision. In addition, we present a two-ph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003